Goal of the notebook: I will be using the tool Bokeh to visualize data I have collected. This data is a csv file created from the generated questions and used on different questions answering models based on a Lunch form.
Side Note: Later in the notebook I change from using the Bokeh tool to using the Plotly visualization tool. The reason for switching over to plotly is for easy to use visualization tool which is also new for me.
import pandas as pd
import os
import numpy as np
# Circle
from math import pi
import pandas as pd
from bokeh.palettes import Category20c
from bokeh.transform import cumsum
from bokeh.plotting import figure, output_notebook, show
from squarify import normalize_sizes, squarify
from bokeh.sampledata.sample_superstore import data
from bokeh.transform import factor_cmap
import plotly.express as px
import plotly
The data that will be used are csv files created from the generated questions and used on different questions answering models. The data is a combination of answers different models predicted for each label of a form.
df = pd.read_csv(r'C:\Users\victo\source\repos\Semester 7\JupyterLab\Group\Question Generator\csv_ouput\df_merged.csv', index_col=[0])
# delete one by one like column is 'Unnamed: 0' so use it's name
# df.drop('Unnamed: 0', axis=1, inplace=True)
df.head()
| label | questions | answer | score | model | percentage | actual_answer | |
|---|---|---|---|---|---|---|---|
| 0 | Number of Attendees | what Number of Attendees? | 15 | 62.77 | model 1 | 1.539416 | 15 |
| 1 | Number of Attendees | who Number of Attendees? | 15 | 67.06 | model 1 | 1.644627 | 15 |
| 2 | Number of Attendees | where Number of Attendees? | 15 | 46.99 | model 1 | 1.152416 | 15 |
| 3 | Number of Attendees | when Number of Attendees? | 15 | 55.51 | model 1 | 1.361367 | 15 |
| 4 | Number of Attendees | why Number of Attendees? | 15 | 55.14 | model 1 | 1.352293 | 15 |
In this chapter
I will be visually representing the different labels that are used in the dataset
x = df.label.value_counts()
data = pd.Series(x).reset_index(name='value').rename(columns={'index': 'country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[len(x)]
data
| country | value | angle | color | |
|---|---|---|---|---|
| 0 | Number of Attendees | 60 | 0.628319 | #3182bd |
| 1 | Budget | 60 | 0.628319 | #6baed6 |
| 2 | Organizer | 60 | 0.628319 | #9ecae1 |
| 3 | Contact Details | 60 | 0.628319 | #c6dbef |
| 4 | Date | 60 | 0.628319 | #e6550d |
| 5 | End Time | 60 | 0.628319 | #fd8d3c |
| 6 | Start Time | 60 | 0.628319 | #fdae6b |
| 7 | Food Allergies | 60 | 0.628319 | #fdd0a2 |
| 8 | Food Diets | 60 | 0.628319 | #31a354 |
| 9 | Location | 60 | 0.628319 | #74c476 |
p = figure(height=350, title="Pie Chart", toolbar_location=None,
tools="hover", tooltips="@country: @value", x_range=(-0.5, 1.0))
p.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data)
p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None
output_notebook()
show(p)
So a little background of the data. The data is based questions created for a lunch form. The lunch form has different kinds of labels which we will touch down in a bit but as we can see from the above piechart, we see the different form labels.
def treemap(df, col, x, y, dx, dy, *, N=100):
sub_df = df.nlargest(N, col)
normed = normalize_sizes(sub_df[col], dx, dy)
print(x)
print(y)
print(dx)
print(dy)
blocks = squarify(normed, x, y, dx, dy)
blocks_df = pd.DataFrame.from_dict(blocks).set_index(sub_df.index)
return sub_df.join(blocks_df, how='left').reset_index()
df_copy = df.copy()
df_copy.head()
| label | questions | answer | score | model | percentage | actual_answer | comparing_answers | |
|---|---|---|---|---|---|---|---|---|
| 0 | Number of Attendees | what Number of Attendees? | 15 | 62.77 | model 1 | 1.539416 | 15 | True |
| 1 | Number of Attendees | who Number of Attendees? | 15 | 67.06 | model 1 | 1.644627 | 15 | True |
| 2 | Number of Attendees | where Number of Attendees? | 15 | 46.99 | model 1 | 1.152416 | 15 | True |
| 3 | Number of Attendees | when Number of Attendees? | 15 | 55.51 | model 1 | 1.361367 | 15 | True |
| 4 | Number of Attendees | why Number of Attendees? | 15 | 55.14 | model 1 | 1.352293 | 15 | True |
df['comparing_answers'] = df.apply(lambda row: all(i in row.answer for i in row.actual_answer), axis=1)
df.head()
| label | questions | answer | score | model | percentage | actual_answer | comparing_answers | |
|---|---|---|---|---|---|---|---|---|
| 0 | Number of Attendees | what Number of Attendees? | 15 | 62.77 | model 1 | 1.539416 | 15 | True |
| 1 | Number of Attendees | who Number of Attendees? | 15 | 67.06 | model 1 | 1.644627 | 15 | True |
| 2 | Number of Attendees | where Number of Attendees? | 15 | 46.99 | model 1 | 1.152416 | 15 | True |
| 3 | Number of Attendees | when Number of Attendees? | 15 | 55.51 | model 1 | 1.361367 | 15 | True |
| 4 | Number of Attendees | why Number of Attendees? | 15 | 55.14 | model 1 | 1.352293 | 15 | True |
df_correct_prediction = df[df.comparing_answers != False]
df_correct_prediction.head()
| label | questions | answer | score | model | percentage | actual_answer | comparing_answers | |
|---|---|---|---|---|---|---|---|---|
| 0 | Number of Attendees | what Number of Attendees? | 15 | 62.77 | model 1 | 1.539416 | 15 | True |
| 1 | Number of Attendees | who Number of Attendees? | 15 | 67.06 | model 1 | 1.644627 | 15 | True |
| 2 | Number of Attendees | where Number of Attendees? | 15 | 46.99 | model 1 | 1.152416 | 15 | True |
| 3 | Number of Attendees | when Number of Attendees? | 15 | 55.51 | model 1 | 1.361367 | 15 | True |
| 4 | Number of Attendees | why Number of Attendees? | 15 | 55.14 | model 1 | 1.352293 | 15 | True |
a = df['model'].unique()
models = sorted(a)
print(sorted(models))
['model 1', 'model 2', 'model 3', 'model 4', 'model 5', 'model 6']
data = data[["City", "Region", "Sales"]]
regions = ("West", "Central", "South", "East")
sales_by_city = data.groupby(["Region", "City"]).sum("Sales")
sales_by_city = sales_by_city.sort_values(by="Sales").reset_index()
sales_by_region = sales_by_city.groupby("Region").sum("Sales").sort_values(by="Sales")
data.shape
(9994, 3)
sales_by_region
| Sales | |
|---|---|
| Region | |
| South | 391721.9050 |
| Central | 501239.8908 |
| East | 678781.2400 |
| West | 725457.8245 |
score_by_label = df_correct_prediction.groupby(["model", "label"]).sum("comparing_answers")
score_by_label = score_by_label.sort_values(by="comparing_answers").reset_index()
score_by_model = score_by_label.groupby("model").sum("comparing_answers").sort_values(by="comparing_answers")
score_by_model
| score | percentage | comparing_answers | |
|---|---|---|---|
| model | |||
| model 3 | 2557.03 | 146.987787 | 39 |
| model 2 | 2195.60 | 90.754293 | 41 |
| model 6 | 1213.33 | 43.869068 | 47 |
| model 1 | 1035.87 | 40.455861 | 49 |
| model 5 | 1942.50 | 105.797038 | 57 |
| model 4 | 2117.84 | 102.528483 | 62 |
x, y, w, h = 0, 0, 800, 450
blocks_by_model = treemap(score_by_model, "comparing_answers", x, y, w, h)
blocks_by_model
| model | score | percentage | comparing_answers | x | y | dx | dy | |
|---|---|---|---|---|---|---|---|---|
| 0 | model 4 | 2117.84 | 102.528483 | 62 | 0.000000 | 0.000000 | 322.711864 | 234.453782 |
| 1 | model 5 | 1942.50 | 105.797038 | 57 | 0.000000 | 234.453782 | 322.711864 | 215.546218 |
| 2 | model 1 | 1035.87 | 40.455861 | 49 | 322.711864 | 0.000000 | 260.338983 | 229.687500 |
| 3 | model 6 | 1213.33 | 43.869068 | 47 | 322.711864 | 229.687500 | 260.338983 | 220.312500 |
| 4 | model 2 | 2195.60 | 90.754293 | 41 | 583.050847 | 0.000000 | 216.949153 | 230.625000 |
| 5 | model 3 | 2557.03 | 146.987787 | 39 | 583.050847 | 230.625000 | 216.949153 | 219.375000 |
blocks_by_region = treemap(sales_by_region, "Sales", x, y, w, h)
dfs = []
for index, (Region, Sales, x, y, dx, dy) in blocks_by_region.iterrows():
df = sales_by_city[sales_by_city.Region==Region]
dfs.append(treemap(df, "Sales", x, y, dx, dy, N=10))
blocks = pd.concat(dfs)
blocks_by_region
| Region | Sales | x | y | dx | dy | |
|---|---|---|---|---|---|---|
| 0 | West | 725457.8245 | 343.703270 | 0.000000 | 252.640624 | 450.000000 |
| 1 | East | 678781.2400 | 596.343894 | 0.000000 | 236.385508 | 450.000000 |
| 2 | Central | 501239.8908 | 832.729402 | 0.000000 | 310.973868 | 252.595298 |
| 3 | South | 391721.9050 | 832.729402 | 252.595298 | 310.973868 | 197.404702 |
def treemap(df, col, x, y, dx, dy, *, N=100):
sub_df = df.nlargest(N, col)
normed = normalize_sizes(sub_df[col], dx, dy)
blocks = squarify(normed, x, y, dx, dy)
blocks_df = pd.DataFrame.from_dict(blocks).set_index(sub_df.index)
return sub_df.join(blocks_df, how='left').reset_index()
x, y, w, h = 0, 0, 800, 450
blocks_by_model
| model | score | percentage | comparing_answers | x | y | dx | dy | |
|---|---|---|---|---|---|---|---|---|
| 0 | model 4 | 2117.84 | 102.528483 | 62 | 0.000000 | 0.000000 | 322.711864 | 234.453782 |
| 1 | model 5 | 1942.50 | 105.797038 | 57 | 0.000000 | 234.453782 | 322.711864 | 215.546218 |
| 2 | model 1 | 1035.87 | 40.455861 | 49 | 322.711864 | 0.000000 | 260.338983 | 229.687500 |
| 3 | model 6 | 1213.33 | 43.869068 | 47 | 322.711864 | 229.687500 | 260.338983 | 220.312500 |
| 4 | model 2 | 2195.60 | 90.754293 | 41 | 583.050847 | 0.000000 | 216.949153 | 230.625000 |
| 5 | model 3 | 2557.03 | 146.987787 | 39 | 583.050847 | 230.625000 | 216.949153 | 219.375000 |
blocks_by_region = treemap(sales_by_region, "Sales", x, y, w, h)
dfs = []
for index, (model, score, percentage,comparing_answers,x, y, dx, dy) in blocks_by_model.iterrows():
df = sales_by_city[sales_by_city.Region==Region]
dfs.append(treemap(df, "Sales", x, y, dx, dy, N=10))
blocks = pd.concat(dfs)
dfs = []
for index, (model, score, percentage,comparing_answers,x, y, dx, dy) in blocks_by_model.iterrows():
df_score = score_by_label[score_by_label.model==model]
# print(df_score)
dfs.append(treemap(df_score, "comparing_answers", x, y, dx, dy, N=10))
blocks = pd.concat(dfs)
blocks
| index | model | label | score | percentage | comparing_answers | x | y | dx | dy | ytop | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 35 | model 4 | Date | 505.25 | 16.538516 | 10 | 0.000000 | 0.000000 | 104.100601 | 117.226891 | 117.226891 |
| 1 | 36 | model 4 | Number of Attendees | 522.17 | 12.806068 | 10 | 0.000000 | 117.226891 | 104.100601 | 117.226891 | 234.453782 |
| 2 | 37 | model 4 | End Time | 392.80 | 27.215032 | 10 | 104.100601 | 0.000000 | 109.305631 | 111.644658 | 111.644658 |
| 3 | 44 | model 4 | Start Time | 446.76 | 30.519520 | 10 | 213.406233 | 0.000000 | 109.305631 | 111.644658 | 111.644658 |
| 4 | 23 | model 4 | Food Allergies | 3.02 | 0.243340 | 7 | 104.100601 | 111.644658 | 69.558129 | 122.809124 | 234.453782 |
| 5 | 16 | model 4 | Budget | 0.00 | 0.000000 | 5 | 173.658731 | 111.644658 | 89.431880 | 68.227291 | 179.871949 |
| 6 | 11 | model 4 | Organizer | 222.36 | 12.361781 | 4 | 173.658731 | 179.871949 | 89.431880 | 54.581833 | 234.453782 |
| 7 | 13 | model 4 | Contact Details | 25.45 | 2.841099 | 4 | 263.090611 | 111.644658 | 59.621254 | 81.872749 | 193.517407 |
| 8 | 5 | model 4 | Food Diets | 0.03 | 0.003127 | 2 | 263.090611 | 193.517407 | 59.621254 | 40.936375 | 234.453782 |
| 0 | 32 | model 5 | Location | 512.78 | 57.877806 | 10 | 0.000000 | 234.453782 | 107.570621 | 113.445378 | 347.899160 |
| 1 | 29 | model 5 | Number of Attendees | 704.32 | 17.273245 | 9 | 0.000000 | 347.899160 | 107.570621 | 102.100840 | 450.000000 |
| 2 | 30 | model 5 | Date | 495.57 | 16.221657 | 9 | 107.570621 | 234.453782 | 113.898305 | 96.428571 | 330.882353 |
| 3 | 27 | model 5 | Start Time | 41.37 | 2.826109 | 8 | 221.468927 | 234.453782 | 101.242938 | 96.428571 | 330.882353 |
| 4 | 17 | model 5 | End Time | 0.00 | 0.000000 | 5 | 107.570621 | 330.882353 | 102.448211 | 59.558824 | 390.441176 |
| 5 | 18 | model 5 | Food Diets | 23.05 | 2.402243 | 5 | 107.570621 | 390.441176 | 102.448211 | 59.558824 | 450.000000 |
| 6 | 12 | model 5 | Budget | 0.00 | 0.000000 | 4 | 210.018832 | 330.882353 | 64.396018 | 75.802139 | 406.684492 |
| 7 | 8 | model 5 | Organizer | 165.40 | 9.195172 | 3 | 274.414851 | 330.882353 | 48.297014 | 75.802139 | 406.684492 |
| 8 | 6 | model 5 | Food Allergies | 0.01 | 0.000806 | 2 | 210.018832 | 406.684492 | 56.346516 | 43.315508 | 450.000000 |
| 9 | 7 | model 5 | Contact Details | 0.00 | 0.000000 | 2 | 266.365348 | 406.684492 | 56.346516 | 43.315508 | 450.000000 |
| 0 | 41 | model 1 | Number of Attendees | 603.97 | 14.812190 | 10 | 322.711864 | 0.000000 | 106.260809 | 114.843750 | 114.843750 |
| 1 | 42 | model 1 | End Time | 149.97 | 10.390627 | 10 | 322.711864 | 114.843750 | 106.260809 | 114.843750 | 229.687500 |
| 2 | 43 | model 1 | Date | 198.11 | 6.484800 | 10 | 428.972674 | 0.000000 | 154.078174 | 79.202586 | 79.202586 |
| 3 | 24 | model 1 | Start Time | 2.25 | 0.153704 | 7 | 428.972674 | 79.202586 | 113.531286 | 75.242457 | 154.445043 |
| 4 | 25 | model 1 | Contact Details | 72.74 | 8.120297 | 7 | 428.972674 | 154.445043 | 113.531286 | 75.242457 | 229.687500 |
| 5 | 9 | model 1 | Organizer | 8.75 | 0.486444 | 3 | 542.503960 | 79.202586 | 40.546888 | 90.290948 | 169.493534 |
| 6 | 1 | model 1 | Budget | 0.08 | 0.007798 | 2 | 542.503960 | 169.493534 | 40.546888 | 60.193966 | 229.687500 |
| 0 | 31 | model 6 | Date | 257.43 | 8.426541 | 10 | 322.711864 | 229.687500 | 110.782546 | 110.156250 | 339.843750 |
| 1 | 33 | model 6 | End Time | 90.44 | 6.266109 | 10 | 322.711864 | 339.843750 | 110.782546 | 110.156250 | 450.000000 |
| 2 | 34 | model 6 | Number of Attendees | 685.84 | 16.820028 | 10 | 433.494410 | 229.687500 | 149.556437 | 81.597222 | 311.284722 |
| 3 | 45 | model 6 | Start Time | 177.57 | 12.130341 | 10 | 433.494410 | 311.284722 | 87.974375 | 138.715278 | 450.000000 |
| 4 | 10 | model 6 | Organizer | 0.05 | 0.002780 | 3 | 521.468785 | 311.284722 | 61.582062 | 59.449405 | 370.734127 |
| 5 | 3 | model 6 | Budget | 0.00 | 0.000000 | 2 | 521.468785 | 370.734127 | 61.582062 | 39.632937 | 410.367063 |
| 6 | 4 | model 6 | Contact Details | 2.00 | 0.223269 | 2 | 521.468785 | 410.367063 | 61.582062 | 39.632937 | 450.000000 |
| 0 | 39 | model 2 | Number of Attendees | 755.63 | 18.531607 | 10 | 583.050847 | 0.000000 | 108.474576 | 112.500000 | 112.500000 |
| 1 | 40 | model 2 | Date | 928.42 | 30.390280 | 10 | 691.525424 | 0.000000 | 108.474576 | 112.500000 | 112.500000 |
| 2 | 22 | model 2 | End Time | 226.30 | 15.679129 | 7 | 583.050847 | 112.500000 | 72.316384 | 118.125000 | 230.625000 |
| 3 | 19 | model 2 | Contact Details | 79.93 | 8.922950 | 5 | 655.367232 | 112.500000 | 103.309120 | 59.062500 | 171.562500 |
| 4 | 20 | model 2 | Budget | 109.86 | 10.709168 | 5 | 655.367232 | 171.562500 | 103.309120 | 59.062500 | 230.625000 |
| 5 | 14 | model 2 | Start Time | 95.46 | 6.521160 | 4 | 758.676352 | 112.500000 | 41.323648 | 118.125000 | 230.625000 |
| 0 | 38 | model 3 | Number of Attendees | 788.75 | 19.343866 | 10 | 583.050847 | 230.625000 | 114.183764 | 106.875000 | 337.500000 |
| 1 | 28 | model 3 | Budget | 679.74 | 66.261149 | 9 | 697.234612 | 230.625000 | 102.765388 | 106.875000 | 337.500000 |
| 2 | 26 | model 3 | Organizer | 521.00 | 28.964237 | 7 | 583.050847 | 337.500000 | 75.932203 | 112.500000 | 450.000000 |
| 3 | 21 | model 3 | Date | 379.74 | 12.430155 | 6 | 658.983051 | 337.500000 | 65.084746 | 112.500000 | 450.000000 |
| 4 | 15 | model 3 | Contact Details | 164.96 | 18.415236 | 4 | 724.067797 | 337.500000 | 75.932203 | 64.285714 | 401.785714 |
| 5 | 2 | model 3 | Start Time | 9.59 | 0.655122 | 2 | 724.067797 | 401.785714 | 50.621469 | 48.214286 | 450.000000 |
| 6 | 0 | model 3 | End Time | 13.25 | 0.918022 | 1 | 774.689266 | 401.785714 | 25.310734 | 48.214286 | 450.000000 |
p = figure(width=w, height=h, tooltips="@label", toolbar_location=None,
x_axis_location=None, y_axis_location=None)
p.x_range.range_padding = p.y_range.range_padding = 0
p.grid.grid_line_color = None
p.block('x', 'y', 'dx', 'dy', source=blocks, line_width=1, line_color="white",
fill_alpha=0.8, fill_color=factor_cmap("model", "MediumContrast4", regions))
p.text('x', 'y', x_offset=2, text="model", source=blocks_by_model,
text_font_size="18pt", text_color="white")
blocks["ytop"] = blocks.y + blocks.dy
p.text('x', 'ytop', x_offset=2, y_offset=2, text="label", source=blocks,
text_font_size="6pt", text_baseline="top",
text_color=factor_cmap("model", ("black", "white", "black", "white","black"), models))
show(p)
df.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
df_correct_prediction.head()
| label | questions | answer | score | model | percentage | actual_answer | comparing_answers | |
|---|---|---|---|---|---|---|---|---|
| 0 | Number of Attendees | what Number of Attendees? | 15 | 62.77 | model 1 | 1.539416 | 15 | True |
| 1 | Number of Attendees | who Number of Attendees? | 15 | 67.06 | model 1 | 1.644627 | 15 | True |
| 2 | Number of Attendees | where Number of Attendees? | 15 | 46.99 | model 1 | 1.152416 | 15 | True |
| 3 | Number of Attendees | when Number of Attendees? | 15 | 55.51 | model 1 | 1.361367 | 15 | True |
| 4 | Number of Attendees | why Number of Attendees? | 15 | 55.14 | model 1 | 1.352293 | 15 | True |
df_correct_prediction['occurence'] = 1
df_correct_prediction.head()
| label | questions | answer | score | model | percentage | actual_answer | comparing_answers | occurence | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Number of Attendees | what Number of Attendees? | 15 | 62.77 | model 1 | 1.539416 | 15 | True | 1 |
| 1 | Number of Attendees | who Number of Attendees? | 15 | 67.06 | model 1 | 1.644627 | 15 | True | 1 |
| 2 | Number of Attendees | where Number of Attendees? | 15 | 46.99 | model 1 | 1.152416 | 15 | True | 1 |
| 3 | Number of Attendees | when Number of Attendees? | 15 | 55.51 | model 1 | 1.361367 | 15 | True | 1 |
| 4 | Number of Attendees | why Number of Attendees? | 15 | 55.14 | model 1 | 1.352293 | 15 | True | 1 |
After seeing how hard it was to set up a treemap in bokeh compared to plotly i chose to do my visualization in plotly for easy to use.
I will be visualizing the number of occurences each model predicts the correct answer for each label to see which model performs the best overal.
fig = px.treemap(df_correct_prediction, path=[px.Constant("all"), 'label', 'model', 'actual_answer'], values='occurence', title="Prediction Occurence of each Labels Based on types of models")
fig.update_traces(root_color="lightgrey", marker=dict(cornerradius=5))
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
fig = px.treemap(df_correct_prediction, path=[px.Constant("all"), 'model', 'label'], values='occurence', title="Prediction occurence of each model based on each of form label")
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
I will be visualizing the total confidence score for each model of predicting the correct answer for each label to see which model performs the best overal.
fig = px.treemap(df_correct_prediction, path=[px.Constant("all"), 'label', 'model', 'actual_answer'], values='score', title="Prediction confidence score of each labels based on types of models")
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
import plotly.express as px
fig = px.treemap(df_correct_prediction, path=[px.Constant("all"), 'model', 'label'], values='score', title="Prediction confidence score of each model based on each of form label")
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
df_correct_prediction.head()
| label | questions | answer | score | model | percentage | actual_answer | comparing_answers | occurence | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Number of Attendees | what Number of Attendees? | 15 | 62.77 | model 1 | 1.539416 | 15 | True | 1 |
| 1 | Number of Attendees | who Number of Attendees? | 15 | 67.06 | model 1 | 1.644627 | 15 | True | 1 |
| 2 | Number of Attendees | where Number of Attendees? | 15 | 46.99 | model 1 | 1.152416 | 15 | True | 1 |
| 3 | Number of Attendees | when Number of Attendees? | 15 | 55.51 | model 1 | 1.361367 | 15 | True | 1 |
| 4 | Number of Attendees | why Number of Attendees? | 15 | 55.14 | model 1 | 1.352293 | 15 | True | 1 |
fig = px.pie(df_correct_prediction, values='occurence', names='label',color_discrete_sequence=px.colors.sequential.RdBu, title='Occurence of Predicting Correct Answer')
fig.show()
Dot plots (also known as Cleveland dot plots) are scatter plots with one categorical axis and one continuous axis. They can be used to show changes between two (or more) points in time or between two (or more) conditions. Compared to a bar chart, dot plots can be less cluttered and allow for an easier comparison between conditions.
fig = px.scatter(score_by_label.sort_values('model'), y="label", x="comparing_answers", color="model", symbol="model", title = '')
fig.update_traces(marker_size=10)
fig.show()
I will be visualizing the number of occurences each model predicts the correct answer for each label to see which model performs the best overal with using the horizontal bar chart.
fig = px.bar(score_by_label.sort_values('model'), x="comparing_answers", y="label", color='model', orientation='h',
hover_data=["comparing_answers", "score"],
height=400,
title='Number of Predicted Correct Occurence per Label for each model')
fig.show()
I will be visualizing the total confidence score for each model of predicting the correct answer for each label to see which model performs the best overal with using the horizontal bar chart.
fig = px.bar(score_by_label.sort_values('model'), x="score", y="label", color='model', orientation='h',
hover_data=["comparing_answers", "score"],
height=400,
title='Number of Predicted Correct Occurence per Label for each model')
fig.show()
Sunburst plots visualize hierarchical data spanning outwards radially from root to leaves. Similar to Icicle charts and Treemaps, the hierarchy is defined by labels (names for px.icicle) and parents attributes. The root starts from the center and children are added to the outer rings.
fig = px.sunburst(score_by_label, path=['label', 'model'], values='comparing_answers')
fig.show()
fig = px.sunburst(score_by_label, path=['label', 'model'], values='score',width=1000, height=500, title = 'Prediction confidence score of each labels based on types of models')
fig.show()
Icicle charts visualize hierarchical data using rectangular sectors that cascade from root to leaves in one of four directions: up, down, left, or right. Similar to Sunburst charts and Treemaps charts, the hierarchy is defined by labels (names for px.icicle) and parents attributes. Click on one sector to zoom in/out, which also displays a pathbar on the top of your icicle. To zoom out, you can click the parent sector or click the pathbar as well.
fig = px.icicle(score_by_label, path=[px.Constant("Lunch Form"), 'label', 'model'], values='comparing_answers')
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
fig = px.area(score_by_label, x="label", y="comparing_answers", color="model", pattern_shape="model")
fig.show()
fig.write_html(r"C:\Users\victo\source\repos\Semester 7\JupyterLab\Data Visualization\file.html")
plotly.offline.init_notebook_mode()
To conclude, this notebook shows different visualization plots that express the data collected.